Goto

Collaborating Authors

 Madison


Lexico: Extreme KV Cache Compression via Sparse Coding over Universal Dictionaries

arXiv.org Artificial Intelligence

We introduce Lexico, a novel KV cache compression method that leverages sparse coding with a universal dictionary. Our key finding is that key-value cache in modern LLMs can be accurately approximated using sparse linear combination from a small, input-agnostic dictionary of 4k atoms, enabling efficient compression across different input prompts, tasks and models. Using orthogonal matching pursuit for sparse approximation, Lexico achieves flexible compression ratios through direct sparsity control. Lexico maintains 90-95% of the original performance while using only 15-25% of the full KV-cache memory, outperforming both quantization and token eviction methods. Notably, Lexico remains effective in low memory regimes where 2-bit quantization fails, achieving up to 1.7 better compression on LongBench and GSM8K while maintaining high accuracy. Figure 1: Memory usage vs. performance of Lexico compared to other key-value (KV) cache compression methods on GSM8K. The figure illustrates the relationship between KV cache size and the performance of Lexico on Llama models on GSM8K 5-shot evaluation. Lexico consistently outperforms both eviction-based methods (SnapKV, PyramidKV) and quantization-based methods (per-token quantization, KIVI, ZipCache). Transformers (Vaswani et al., 2017) have become the backbone of frontier Large Language Models (LLMs), driving progress in domains beyond natural language processing. However, Transformers are typically limited by their significant memory requirements. This stems not only from the large number of model parameters, but also from the having to maintain the KV cache that grows proportional to the model size (i.e., the number of layers, heads, and also embedding dimension) and token length of the input.


Multi-Instance Multi-Label Learning with Application to Scene Classification

Neural Information Processing Systems

In this paper, we formalize multi-instance multi-label learning, where each training example is associated with not only multiple instances but also multiple class labels. Such a problem can occur in many real-world tasks, e.g. an image usually contains multiple patches each of which can be described by a feature vector, and the image can belong to multiple categories since its semantics can be recognized in different ways. We analyze the relationship between multi-instance multi-label learning and the learning frameworks of traditional supervised learning, multiinstance learning and multi-label learning.


Multi-Instance Multi-Label Learning with Application to Scene Classification

Neural Information Processing Systems

In this paper, we formalize multi-instance multi-label learning, where each training example is associated with not only multiple instances but also multiple class labels. Such a problem can occur in many real-world tasks, e.g. an image usually contains multiple patches each of which can be described by a feature vector, and the image can belong to multiple categories since its semantics can be recognized in different ways. We analyze the relationship between multi-instance multi-label learning and the learning frameworks of traditional supervised learning, multiinstance learning and multi-label learning.